124 research outputs found
Fully Bayesian Logistic Regression with Hyper-Lasso Priors for High-dimensional Feature Selection
High-dimensional feature selection arises in many areas of modern science.
For example, in genomic research we want to find the genes that can be used to
separate tissues of different classes (e.g. cancer and normal) from tens of
thousands of genes that are active (expressed) in certain tissue cells. To this
end, we wish to fit regression and classification models with a large number of
features (also called variables, predictors). In the past decade, penalized
likelihood methods for fitting regression models based on hyper-LASSO
penalization have received increasing attention in the literature. However,
fully Bayesian methods that use Markov chain Monte Carlo (MCMC) are still in
lack of development in the literature. In this paper we introduce an MCMC
(fully Bayesian) method for learning severely multi-modal posteriors of
logistic regression models based on hyper-LASSO priors (non-convex penalties).
Our MCMC algorithm uses Hamiltonian Monte Carlo in a restricted Gibbs sampling
framework; we call our method Bayesian logistic regression with hyper-LASSO
(BLRHL) priors. We have used simulation studies and real data analysis to
demonstrate the superior performance of hyper-LASSO priors, and to investigate
the issues of choosing heaviness and scale of hyper-LASSO priors.Comment: 33 pages. arXiv admin note: substantial text overlap with
arXiv:1308.469
A Method for Compressing Parameters in Bayesian Models with Application to Logistic Sequence Prediction Models
Bayesian classification and regression with high order interactions is
largely infeasible because Markov chain Monte Carlo (MCMC) would need to be
applied with a great many parameters, whose number increases rapidly with the
order. In this paper we show how to make it feasible by effectively reducing
the number of parameters, exploiting the fact that many interactions have the
same values for all training cases. Our method uses a single ``compressed''
parameter to represent the sum of all parameters associated with a set of
patterns that have the same value for all training cases. Using symmetric
stable distributions as the priors of the original parameters, we can easily
find the priors of these compressed parameters. We therefore need to deal only
with a much smaller number of compressed parameters when training the model
with MCMC. The number of compressed parameters may have converged before
considering the highest possible order. After training the model, we can split
these compressed parameters into the original ones as needed to make
predictions for test cases. We show in detail how to compress parameters for
logistic sequence prediction models. Experiments on both simulated and real
data demonstrate that a huge number of parameters can indeed be reduced by our
compression method.Comment: 29 page
Bayesian Classification and Regression with High Dimensional Features
This thesis responds to the challenges of using a large number, such as
thousands, of features in regression and classification problems.
There are two situations where such high dimensional features arise. One is
when high dimensional measurements are available, for example, gene expression
data produced by microarray techniques. For computational or other reasons,
people may select only a small subset of features when modelling such data, by
looking at how relevant the features are to predicting the response, based on
some measure such as correlation with the response in the training data.
Although it is used very commonly, this procedure will make the response appear
more predictable than it actually is. In Chapter 2, we propose a Bayesian
method to avoid this selection bias, with application to naive Bayes models and
mixture models.
High dimensional features also arise when we consider high-order
interactions. The number of parameters will increase exponentially with the
order considered. In Chapter 3, we propose a method for compressing a group of
parameters into a single one, by exploiting the fact that many predictor
variables derived from high-order interactions have the same values for all the
training cases. The number of compressed parameters may have converged before
considering the highest possible order. We apply this compression method to
logistic sequence prediction models and logistic classification models.
We use both simulated data and real data to test our methods in both
chapters.Comment: PhD Thesis Submitted to University of Toronto, 129 Page
Approximating Cross-validatory Predictive P-values with Integrated IS for Disease Mapping Models
An important statistical task in disease mapping problems is to identify out-
lier/divergent regions with unusually high or low residual risk of disease.
Leave-one-out cross-validatory (LOOCV) model assessment is a gold standard for
computing predictive p-value that can flag such outliers. However, actual LOOCV
is time-consuming because one needs to re-simulate a Markov chain for each
posterior distribution in which an observation is held out as a test case. This
paper introduces a new method, called iIS, for approximating LOOCV with only
Markov chain samples simulated from a posterior based on a full data set. iIS
is based on importance sampling (IS). iIS integrates the p-value and the
likelihood of the test observation with respect to the distribution of the
latent variable without reference to the actual observation. The predictive
p-values computed with iIS can be proved to be equivalent to the LOOCV
predictive p-values, following the general theory for IS. We com- pare iIS and
other three existing methods in the literature with a lip cancer dataset
collected in Scotland. Our empirical results show that iIS provides predictive
p-values that are al- most identical to the actual LOOCV predictive p-values
and outperforms the existing three methods, including the recently proposed
ghosting method by Marshall and Spiegelhalter (2007).Comment: 21 page
A Method for Avoiding Bias from Feature Selection with Application to Naive Bayes Classification Models
For many classification and regression problems, a large number of features
are available for possible use - this is typical of DNA microarray data on gene
expression, for example. Often, for computational or other reasons, only a
small subset of these features are selected for use in a model, based on some
simple measure such as correlation with the response variable. This procedure
may introduce an optimistic bias, however, in which the response variable
appears to be more predictable than it actually is, because the high
correlation of the selected features with the response may be partly or wholely
due to chance. We show how this bias can be avoided when using a Bayesian model
for the joint distribution of features and response. The crucial insight is
that even if we forget the exact values of the unselected features, we should
retain, and condition on, the knowledge that their correlation with the
response was too small for them to be selected. In this paper we describe how
this idea can be implemented for ``naive Bayes'' models of binary data.
Experiments with simulated data confirm that this method avoids bias due to
feature selection. We also apply the naive Bayes model to subsets of data
relating gene expression to colon cancer, and find that correcting for bias
from feature selection does improve predictive performance
- …